Fix PDF loading failures from trailing null padding and PdfName cache eviction by Mythie · Pull Request #57 · LibPDF-js/core

Mythie · 2026-03-21T02:48:40Z

Fixes #54.

findStartXRef misses startxref behind trailing null padding

Some systems pad PDFs with null bytes after %%EOF. The 1024-byte backward search lands entirely in padding. Now we skip trailing whitespace first, then search.

Brute-force recovery fails on streams with indirect /Length

IndirectObjectParser during recovery has no lengthResolver, so /Length 42 0 R throws. If that stream is an ObjStm, its compressed objects are lost. Now catches the failure and scans for endstream.

PdfName LRU evicts names still held as PdfDict keys

PdfDict uses Map<PdfName, PdfObject> reference equality. The 10k LRU could evict names still in use as keys, so dict.get("Root") silently returns undefined. Replaced with WeakRef + FinalizationRegistry.

Names stay cached as long as anyone holds a reference. Load test confirms the old code breaks under pressure.

PDFs padded with null bytes beyond %%EOF (common when uploaded through systems that pad to block boundaries) caused startxref lookup to fail because the 1024-byte search window fell entirely within padding. Skip trailing whitespace to find the effective end of file before searching. Fixes #54.

During brute-force recovery, IndirectObjectParser has no lengthResolver, so streams with indirect /Length references (e.g. /Length 42 0 R) would fail to parse. This prevented object streams from being read, making their compressed objects invisible to recovery. Now scans forward for the endstream keyword as a fallback, matching the approach used by pdf.js and PDFBox. Partial fix for #54.

The LRU cache (max 10k) could evict PdfName instances still held as keys in PdfDict's Map<PdfName, PdfObject>, causing silent lookup failures via reference inequality. This manifests in long-running servers processing many PDFs with diverse name sets. Replace with a WeakRef-based cache (matching PDFBox's COSName approach): names stay interned as long as any live object holds a strong reference, and a FinalizationRegistry cleans up dead entries. Also expands the permanent cache with trailer keys (Root, Size, Info, Prev, ID, Encrypt) and high-frequency names (Subtype, Font, BaseFont, Encoding, XObject, Annots, Names). Closes #54.

vercel · 2026-03-21T02:48:45Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
core	Ready	Preview, Comment	Mar 21, 2026 2:50am

github-actions · 2026-03-21T02:50:01Z

Benchmark Results

Comparison

Load PDF

Benchmark	Mean	p99	RME	Samples
libpdf	2.28ms	3.26ms	±1.5%	220
pdf-lib	39.39ms	44.97ms	±4.8%	13
@cantoo/pdf-lib	38.61ms	42.87ms	±2.5%	13

Create blank PDF

Benchmark	Mean	p99	RME	Samples
libpdf	58μs	131μs	±1.6%	8568
pdf-lib	411μs	1.44ms	±2.4%	1217
@cantoo/pdf-lib	443μs	1.67ms	±2.7%	1130

Add 10 pages

Benchmark	Mean	p99	RME	Samples
libpdf	103μs	277μs	±1.3%	4854
pdf-lib	540μs	2.00ms	±3.0%	927
@cantoo/pdf-lib	499μs	2.51ms	±3.7%	1005

Draw 50 rectangles

Benchmark	Mean	p99	RME	Samples
libpdf	323μs	916μs	±1.8%	1549
pdf-lib	1.77ms	6.72ms	±6.8%	285
@cantoo/pdf-lib	2.05ms	5.83ms	±5.7%	244

Load and save PDF

Benchmark	Mean	p99	RME	Samples
libpdf	2.38ms	4.71ms	±2.3%	210
pdf-lib	90.37ms	125.39ms	±10.4%	10
@cantoo/pdf-lib	156.46ms	161.20ms	±1.1%	10

Load, modify, and save PDF

Benchmark	Mean	p99	RME	Samples
libpdf	42.73ms	47.73ms	±3.9%	12
pdf-lib	86.39ms	95.34ms	±3.7%	10
@cantoo/pdf-lib	155.32ms	158.56ms	±1.3%	10

Extract single page from 100-page PDF

Benchmark	Mean	p99	RME	Samples
libpdf	3.68ms	5.84ms	±1.8%	136
pdf-lib	9.18ms	13.49ms	±2.5%	55
@cantoo/pdf-lib	9.75ms	12.24ms	±2.6%	52

Split 100-page PDF into single-page PDFs

Benchmark	Mean	p99	RME	Samples
libpdf	33.21ms	36.20ms	±2.6%	16
pdf-lib	87.83ms	90.92ms	±2.4%	6
@cantoo/pdf-lib	95.28ms	109.76ms	±8.9%	6

Split 2000-page PDF into single-page PDFs (0.9MB)

Benchmark	Mean	p99	RME	Samples
libpdf	612.13ms	612.13ms	±0.0%	1
pdf-lib	1.67s	1.67s	±0.0%	1
@cantoo/pdf-lib	1.73s	1.73s	±0.0%	1

Copy 10 pages between documents

Benchmark	Mean	p99	RME	Samples
libpdf	4.63ms	5.66ms	±1.4%	109
pdf-lib	12.15ms	14.56ms	±2.0%	42
@cantoo/pdf-lib	14.00ms	19.11ms	±3.5%	36

Merge 2 x 100-page PDFs

Benchmark	Mean	p99	RME	Samples
libpdf	14.86ms	21.36ms	±3.3%	34
pdf-lib	55.24ms	58.07ms	±2.2%	10
@cantoo/pdf-lib	64.76ms	67.93ms	±1.9%	8

Fill FINTRAC form fields

Benchmark	Mean	p99	RME	Samples
libpdf	21.13ms	24.63ms	±3.6%	24
pdf-lib	34.81ms	42.54ms	±5.3%	15
@cantoo/pdf-lib	36.24ms	44.21ms	±5.8%	14

Fill and flatten FINTRAC form

Benchmark	Mean	p99	RME	Samples
libpdf	20.08ms	35.25ms	±7.9%	25
pdf-lib	FAILED	-	-	0
@cantoo/pdf-lib	39.70ms	44.90ms	±4.7%	13

Copying

Copy pages between documents

Benchmark	Mean	p99	RME	Samples
copy 1 page	1.04ms	2.01ms	±2.5%	483
copy 10 pages from 100-page PDF	4.46ms	5.20ms	±1.0%	112
copy all 100 pages	7.34ms	9.47ms	±1.3%	69

Duplicate pages within same document

Benchmark	Mean	p99	RME	Samples
duplicate page 0	882μs	1.40ms	±1.1%	568
duplicate all pages (double the document)	873μs	1.61ms	±1.1%	573

Merge PDFs

Benchmark	Mean	p99	RME	Samples
merge 2 small PDFs	1.45ms	2.29ms	±1.3%	345
merge 10 small PDFs	7.76ms	9.32ms	±1.2%	65
merge 2 x 100-page PDFs	13.42ms	13.92ms	±0.8%	38

Drawing

benchmarks/drawing.bench.ts

Benchmark	Mean	p99	RME	Samples
draw 100 rectangles	544μs	1.30ms	±1.8%	920
draw 100 circles	1.28ms	3.15ms	±3.1%	392
draw 100 lines	518μs	1.23ms	±2.1%	966
draw 100 text lines (standard font)	1.56ms	2.30ms	±1.3%	321
create 10 pages with mixed content	1.37ms	2.70ms	±2.2%	366

Forms

benchmarks/forms.bench.ts

Benchmark	Mean	p99	RME	Samples
get form fields	3.61ms	8.65ms	±4.7%	139
fill text fields	11.92ms	18.64ms	±4.6%	42
read field values	2.88ms	3.69ms	±1.2%	174
flatten form	8.52ms	12.82ms	±3.1%	59

Loading

benchmarks/loading.bench.ts

Benchmark	Mean	p99	RME	Samples
load small PDF (888B)	57μs	130μs	±0.7%	8819
load medium PDF (19KB)	88μs	119μs	±0.5%	5681
load form PDF (116KB)	1.29ms	1.85ms	±0.9%	388
load heavy PDF (9.9MB)	2.17ms	2.61ms	±0.7%	231

Saving

benchmarks/saving.bench.ts

Benchmark	Mean	p99	RME	Samples
save unmodified (19KB)	108μs	248μs	±4.9%	4613
save with modifications (19KB)	753μs	1.42ms	±1.3%	665
incremental save (19KB)	160μs	324μs	±1.0%	3132
save heavy PDF (9.9MB)	2.28ms	2.80ms	±1.1%	220
incremental save heavy PDF (9.9MB)	8.38ms	10.01ms	±3.2%	60

Splitting

Extract single page

Benchmark	Mean	p99	RME	Samples
extractPages (1 page from small PDF)	1.04ms	2.18ms	±2.4%	481
extractPages (1 page from 100-page PDF)	3.92ms	6.93ms	±2.9%	128
extractPages (1 page from 2000-page PDF)	62.27ms	65.16ms	±1.7%	10

Split into single-page PDFs

Benchmark	Mean	p99	RME	Samples
split 100-page PDF (0.1MB)	32.09ms	37.83ms	±4.0%	16
split 2000-page PDF (0.9MB)	572.79ms	572.79ms	±0.0%	1

Batch page extraction

Benchmark	Mean	p99	RME	Samples
extract first 10 pages from 2000-page PDF	61.93ms	63.39ms	±1.2%	9
extract first 100 pages from 2000-page PDF	65.27ms	66.70ms	±1.5%	8
extract every 10th page from 2000-page PDF (200 pages)	71.17ms	75.55ms	±2.3%	8

Environment

Runner: Linux (X64)
Runtime: Bun 1.3.11

Results are machine-dependent.

Mythie added 3 commits March 21, 2026 12:45

vercel bot deployed to Preview March 21, 2026 02:50 View deployment

Mythie merged commit f8cde4a into main Mar 21, 2026
6 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix PDF loading failures from trailing null padding and PdfName cache eviction#57

Fix PDF loading failures from trailing null padding and PdfName cache eviction#57
Mythie merged 3 commits intomainfrom
issue/54

Mythie commented Mar 21, 2026 •

edited

Loading

Uh oh!

vercel bot commented Mar 21, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Mar 21, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mythie commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

vercel bot commented Mar 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Mar 21, 2026

Benchmark Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Mythie commented Mar 21, 2026 •

edited

Loading

vercel bot commented Mar 21, 2026 •

edited

Loading